Bioinformatics Genome Cluster Database . A Sequence Family Analysis Platform for Arabidopsis and Rice 1

نویسندگان

  • Kevin Horan
  • Josh Lauricha
  • Julia Bailey-Serres
  • Natasha Raikhel
  • Thomas Girke
چکیده

The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Webaccessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified f...

متن کامل

Isolation and molecular characterization of the RecQsim gene in Arabidopsis, rice (Oryza sativa) and rape (Brassica napus)

In any organism that reproduces sexually, DNA Recombination plays vital roles in the generation of allelic diversity as well as in preservation of genome fidelity. Genome fidelity is particularly important in plants because mutations occurring during the development of flowering plants are heritable and can be passed onto the next generation. One of the gene families that play crucial roles in ...

متن کامل

Genome Analysis The Arabidopsis Unannotated Secreted Peptide Database, a Resource for Plant Peptidomics

In the era of genomics, if a gene is not annotated, it is not investigated. Due to their small size, genes encoding peptides are often missed in genome annotations. Secreted peptides are important regulators of plant growth, development, and physiology. Identification of additional peptide signals by sequence homology searches has had limited success due to sequence heterogeneity. A bioinformat...

متن کامل

Genome-wide analysis of basic/helix-loop-helix transcription factor family in rice and Arabidopsis.

The basic/helix-loop-helix (bHLH) transcription factors and their homologs form a large family in plant and animal genomes. They are known to play important roles in the specification of tissue types in animals. On the other hand, few plant bHLH proteins have been studied functionally. Recent completion of whole genome sequences of model plants Arabidopsis (Arabidopsis thaliana) and rice (Oryza...

متن کامل

PIP: a database of potential intron polymorphism markers

MOTIVATION With the recent progress made in large-scale plant functional genome sequencing projects, a great amount of EST (express sequence tag) data is becoming available. With the help of complete genomic sequence information of model plants (rice and Arabidopsis), it is possible to predict the joints between adjacent exons after splicing (or termed 'intron positions' for short) in homologou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005